Bank Case Study

Author: Evan Condalary

Goals:

  1. Investigate sample data set with descriptive stats / visualizations
  2. Find most explanatory features for y to build intuition
  3. Use simple imputation for missing categorical values
  4. Evaluate several appropriate ML model candidates on sample
  5. Hyper-parameter tuning for top model candidates from above
  6. Evaluate performance of best model on sample
  7. Identify best cutoff probabilty for positive prediction
  8. Explore most explanatory features from final model

Model Metrics:

Given more time/resources:

  1. Explore more complex imputation methods for categorical variables
  2. Do more thorough outlier analysis / pruning based on example characteristics
  3. Evaluate model performance by various features of interest (ex. age bands or other demographic features)
  4. Identify better scaling methods for each numeric variable
  5. Create larger hyper-parameter tuning space for candidate models (and tune more candidates)